Application of Ant-based Template Matching for Web Documents Categorization

نویسندگان

  • Siok Lan Ong
  • Weng-Kin Lai
  • Tracy S. Y. Tai
  • Kok Meng Hoe
  • Choo Hau Ooi
چکیده

The self-organization behavior exhibited by ants may be modeled to solve real world clustering problems. The general idea of artificial ants walking around in search space to pick up, or drop an item based upon some probability measure has been examined to cluster a large number of World Wide Web (WWW) documents. However, this idea is extended with the direct application of template matching with a Gaussian Probability Surface (GPS) to constrain the formation of the clusters in pre-defined areas of workspace with these multi-agents in this paper. Some comparisons between the clustering performance of supervised ants using GPS against the typical ants clustering algorithm are shown. Their performance are evaluated on the same dataset consisting of a collection of multi-class web documents. Finally, the paper concludes with some recommendations for further investigation. Povzetek: Tehnike kolonij mravelj so bile uporabljene za kategorizacijo internetnih dokumentov.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of Similarity Measures for Template Matching

Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...

متن کامل

A Hybrid Approach to Statsistical and Semantical Analysis of Web Documents

This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we outline a combined mathematical and semant...

متن کامل

A New RSTB Invariant Image Template Matching Based on Log-Spectrum and Modified ICA

Template matching is a widely used technique in many of image processing and machine vision applications. In this paper we propose a new as well as a fast and reliable template matching algorithm which is invariant to Rotation, Scale, Translation and Brightness (RSTB) changes. For this purpose, we adopt the idea of ring projection transform (RPT) of image. In the proposed algorithm, two novel s...

متن کامل

Search Query Categorization at Scale

State of the art query categorization methods usually exploit web search services to retrieve the best matching web documents and map them to a given taxonomy of categories. This is effective but impractical when one does not own a web corpus and has to use a 3 party web search engine API. The problem lies in performance and in financial costs. In this paper, we present a novel, fast and scalab...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Informatica (Slovenia)

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2005